gradient descent work
How Projected Gradient Descent works in Machine Learning pipelines part1
Abstract: This paper addresses a distributed convex optimization problem with a class of coupled constraints, which arise in a multi-agent system composed of multiple communities modeled by cliques. First, we propose a fully distributed gradient-based algorithm with a novel operator inspired by the convex projection, called the clique-based projection. Next, we scrutinize the convergence properties for both diminishing and fixed step sizes. For diminishing ones, we show the convergence to an optimal solution under the assumptions of the smoothness of an objective function and the compactness of the constraint set. Additionally, when the objective function is strongly monotone, the strict convergence to the unique solution is proved without the assumption of compactness.
How Projected Gradient Descent works in Machine Learning pipelines part2
Abstract: The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence of PGD for UMLS often focus on global convergence to stationary points. As a non-convex problem, only a sublinear convergence rate has been established. However, these results do not explain the fast convergence of PGD frequently observed in practice. This manuscript presents a novel analysis of convergence of PGD for UMLS, justifying the linear convergence behavior of the algorithm near the solution.
Why Gradient Descent Works?
Gradient descent is an iterative optimization algorithm that is used to optimize the weights of a machine learning model (linear regression, neural networks, etc.) by minimizing the cost function of that model. The intuition behind gradient descent is this: Picture the cost function (denoted by f(Θ) where Θ [Θ₁, … Θₙ]) plotted in n dimensions as a bowl. Imagine a randomly placed point on that bowl represented by n coordinates (this is the initial value of your cost function). The minimum of this "function" then will be the bottom of the bowl. The goal is then to reach to the bottom of the bowl (or minimize the cost) by progressively moving downwards on the bowl.